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Abstract 

We  developed  several  new  direetions  in  the  theory  and  applieations  of  Optimal  Mass  Transport 
(OMT).  OMT  has  its  origins  in  civil  engineering  (Monge  1781)  and  economics  (Kantorovich 
1942),  but  in  recent  years  has  increasingly  impacted  a  large  number  of  other  fields  (probability 
theory,  partial  differential  equations,  physics,  meteorology).  We  have  addressed  computational 
aspects  of  the  problem  and  the  need  for  further  expanding  the  arsenal  of  computational  tools.  We 
considered  a  wide  range  of  generalizations  and  insights  for  the  purpose  of  tackling  problems  of 
AFOSR  interest.  These  include  matrix-valued  statistics  and  fusion  of  information,  optical  flow, 
controlled  active  vision,  tracking  and  dynamic  textures. 

Duration: 

06/15/2012  —  09/15/2016 
Status/Progress 

We  have  accomplished  the  following  in  our  program  and  proposed  research: 

(i)  Computational  Tools  for  Optimal  Mass  Transport  (OMT)  :  We  developed  a  number  of  tools 
allowing  us  to  solve  several  problems,  including  the  construction  of  geodesics,  computation 
of  metric  distances,  and  transportation  means.  Such  constructions  are  motivated  by  a  variety 
of  engineering  applications.  Further,  we  have  exploited  the  beautiful  connection  between  the 
Boltzmann  entropy  and  the  heat  equation.  The  latter  arises  as  the  steepest  descent  when  max¬ 
imizing  the  Boltzmann  entropy  “potential”  in  the  Riemannian  metric  inherited  on  the  space 
of  probability  densities  via  OMT.  The  rate  of  ascent,  since  entropy  increases,  is  given  by 
the  Fisher  information  metric.  The  same  paradigm  has  been  used  to  recover/generate  other 
gradient  flows  (PDFs)  and  thereby  link  via  suitable  information  potentials  to  corresponding 
metrics  for  distributions. 

(ii)  Power  spectra:  Our  work  on  high  resolution  signal  analysis  has  led  to  a  number  of  novel 
notions  of  distance  between  power  spectra  with  applications  to  prediction  theory.  We  have 
investigated  the  topic  of  matrix-valued  power  spectra  as  it  relates  to  several  DoD  interests. 
In  this  regard,  prediction  theory  is  juxtaposed  with  developments  in  quantum  information 
theory,  creating  a  synergy  of  methodologies  for  signal  analysis. 
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(iii)  Tracking  and  Mesh  Generation:  We  have  continued  our  work  in  visual  tracking  and  con¬ 
trolled  active  vision.  Mass  transport  is  being  used  as  a  comparison  metric  on  shapes  as  part 
of  a  feedback  loop  for  tracking  in  conjunction  with  statistical  filtering.  Further,  for  vari¬ 
ous  problems  in  computational  fluid  dynamics,  biomechanics,  and  CAD,  we  have  developed 
novel  techniques  based  on  OMT  that  may  be  employed  for  the  automatic  generation  of  hex- 
ahedral  meshes  for  three  dimensional  volumes. 

(iv)  Optimal  transport  on  networks:  We  have  made  significant  advances  on  formulating  and 
solving  optimal  transport  problems  on  discrete  spaces  (networks)  while  ensuring  robustness 
of  the  transportation  plan.  This  work  makes  contact  with  a  probabilistic  formalism  of  bridg¬ 
ing  two  probability  distributions  along  path  of  a  random  walker  that  displays  the  two  given 
marginals.  The  development  builds  on  the  theory  of  the  so-called  Schrodinger  bridges  and 
opens  up  new  directions  for  investigating  geometries  of  mass  and  probability  distributions. 
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1  Introduction 


Optimal  mass  transport  is  a  major  research  area  with  applications  for  numerous  disciplines  in¬ 
cluding  econometrics,  fluid  dynamics,  automatic  control,  transportation,  statistical  physics,  shape 
optimization,  expert  systems,  and  meteorology  [52,  68].  The  problem  was  first  formulated  by 
the  civil  engineer  Caspar  Monge  in  1781,  and  concerned  finding  the  optimal  way,  in  the  sense 
of  minimal  transportation  cost,  of  moving  a  pile  of  soil  from  one  site  to  another.  Much  later  the 
problem  was  extensively  analyzed  by  the  Soviet  mathematician  Kantorovich  [35]  with  a  focus  on 
economic  resource  allocation,  and  so  is  now  known  as  the  Monge-Kantorovich  (MK)  or  optimal 
mass  transport  (OMT)  problem. 

A  major  problem  that  elucidates  how  OMT  is  employed  is  image  registration  [28].  Since  this 
appears  in  many  practical  systems,  tracking,  and  vision  applications,  we  will  briefly  explain  how 
OMT  may  be  used  to  treat  this  problem.  Registration  is  the  process  of  establishing  a  common 
geometric  reference  frame  between  two  or  more  data  sets  obtained  by  possibly  different  imaging 
modalities  and  at  different  times.  Registration  typically  proceeds  in  several  steps.  First,  a  measure 
of  similarity  between  the  data  sets  is  established,  so  that  one  can  quantify  how  close  an  image  is 
from  another  after  transformations  are  applied.  Such  a  measure  may  include  the  similarity  between 
pixel  intensity  values,  as  well  as  the  proximity  of  predefined  image  features  such  as  implanted 
fiducials,  anatomical  landmarks,  surface  contours,  and  ridge  lines.  Then  the  transformation  that 
maximizes  the  similarity  between  the  transformed  images  is  found.  Many  times  this  transforma¬ 
tion  is  given  as  the  solution  of  an  optimization  problem  where  the  transformations  to  be  considered 
are  constrained  to  be  members  of  a  predetermined  class.  Lastly,  once  an  optimal  transformation 
is  obtained,  it  is  used  to  fuse  the  image  data  sets.  Registration  has  a  huge  literature  devoted  to 
it  with  numerous  approaches  ranging  from  statistical  to  computational  fluid  dynamics  to  various 
types  of  warping  methodologies;  see  [66,  70].  One  way  of  defining  density  is  via  “intensity,”  and 
in  such  a  case  the  method  explicated  in  this  proposal  can  be  considered  an  intensity-driven  one. 
The  method  we  devised  as  part  of  our  AFOSR  funded  research,  is  also  in  the  class  of  warping 
strategies  based  on  continuum  and  fluid  mechanics,  in  which  one  tries  to  use  properties  of  elastic 
materials  to  determine  the  deformation.  Here  one  defines  a  (typically  quadratic)  cost  functional 
that  penalizes  the  mismatch  between  the  deforming  template  and  target.  A  key  fact  that  will  be 
employed  when  we  describe  a  Riemannian  metric  on  the  space  of  probability  densisites  is  that  the 
optimal  warping  map  of  OMT  may  be  regarded  as  the  velocity  vector  field  which  minimizes  a 
standard  energy  integral  subject  to  the  Euler  continuity  (mass  preservation)  equation  [4].  Thus,  the 
theory  of  OMT  allows  a  natural  geometry  on  the  space  of  distributions  and  suitable  warping  maps 
and  geodesics  which  establish  correspondence  between  distributions  and  may  be  used  suitably  in 
applications. 

With  this  background,  we  have  investigated  the  following  problems  in  our  just  completed 
AFOSR  research  program: 

(i)  Riemannian  Metrics  and  Gradient  Flows:  There  is  a  beautiful  connection  between  the 
Boltzmann  entropy  and  the  heat  equation.  The  latter  arises  as  the  steepest  descent  when 
maximizing  the  Boltzmann  entropy  “potential”  in  the  Riemannian  metric  inherited  by  the 
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OMT.  The  rate  of  ascent,  since  entropy  increases,  is  given  by  the  Fisher  information  met¬ 
ric.  The  same  paradigm  can  be  used  to  recover/generate  other  gradient  flows  (PDFs)  and 
thereby  link  via  suitable  information  potentials  to  corresponding  metrics  for  distributions.  In 
particular,  we  recover  the  affine  invariant  heat  equation  and  draw  connections  with  metrics 
on  power  spectra.  This  topic  is  primarily  of  theoretical  interest  with  potentially  important 
insights  into  the  qualities  of  various  alternative  metrizations  for  the  space  of  distributions. 

(ii)  Optical  Flow,  Tracking,  and  Mesh  Generation:  Optical  flow  is  a  key  problem  in  controlled 
active  vision  and  thus  in  visual  tracking.  The  optical  flow  field  is  defined  as  the  velocity 
vector  field  of  apparent  motion  of  brightness  patterns  in  a  sequence  of  images.  There  have 
been  many  methods  proposed  for  its  computation.  We  have  shown  that  show  that  ideas  from 
optimal  mass  transport  are  ideal  for  the  computation  of  the  optical  flow  field  for  scenar¬ 
ios  involving  dynamic  textures,  that  is,  for  objects  that  have  internal  dynamics  such  as  fire 
and  smoke.  Further,  for  various  problems  in  computational  fluid  dynamics,  biomechanics, 
and  CAD,  we  have  shown  that  techniques  from  OMT  may  be  employed  for  the  automatic 
generation  of  hexahedral  meshes  for  three  dimensional  volumes. 

2  Summary  of  Work 

We  summarize  some  of  the  key  results  developed  as  part  of  our  AFOSR  research  program.  Optimal 
mass  transport  (OMT)  and  the  mathematics  that  were  spawn  from  the  Monge-Kantorovich  problem 
have  impacted  a  number  of  fields  including  probability  theory,  statistics,  physics,  the  atmospheric 
sciences,  economics,  and  functional  analysis;  e.g.,  see  [2,  4,  52].  Our  research  focused  around 
around  problems  in  information  fusion  and  control,  and  image  registration. 

2.1  Geometry  of  optimal  mass  transport 

Consider  once  again  the  OMT  problem.  In  our  program,  we  studied,  in  particular,  costs  of  the  form 
p{u,  x)  =  \u  —  x\^  (p  >  1),  giving  rise  to  the  Kantorovich-Wasserstein  metric 

dp{po,  piY  :=  inf  /  p,o{x)\u{x)  —  xf  dx.  (1) 

u  e  MP  J 

Thus,  an  optimal  MP-map,  when  it  exists,  is  one  which  minimizes  the  above  integral.  The  integral 
represents  a  cost  on  the  distance  the  map  u  moves  each  bit  of  material,  weighted  by  the  respective 
mass. 

The  case  p  =  2  has  been  extensively  studied  in  recent  years.  A  fundamental  result  by  Yann 
Brenier  [10]  is  that  there  is  a  unique  optimal  u  G  MP  transporting  po  to  pi,  and  that  this  u  is 
characterized  as  the  gradient  of  a  convex  function  w,  i.e.,  u  =  Vg  where  g  can  be  thought  as  a 
convex  potential.  The  geometric  significance  of  this  can  be  traced  to  the  fact  that  for  a  transference 
plan  to  be  optimal,  there  should  be  no  “crossing”  of  paths  that  individual  specs  of  mass  take.  This 
insight  [13,  40]  forces  the  graph  of  the  optimal  plan  to  have  a  certain  cyclically  monotone  property, 
which  then,  by  a  theorem  of  Rockafeller  [59],  implies  that  it  is  the  (sub-)differential  of  a  convex 
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function.  The  novelty  of  this  result  is  that  very  much  like  the  Riemann  mapping  theorem  in  the 
plane,  OMT  singles  out  particular  maps  with  preferred  geometry. 

It  is  interesting  to  speculate  whether  a  similar  geometric  insight  is  relevant  in  optimal  multivari¬ 
able  couplings  amongst  more  than  one  distribution.  Such  problems  will  be  motivated  later  on  in 
the  context  of  fusion  of  information  from  various  sources/sensors.  Recent  works  [60,  24]  motivate 
analogous  questions  but  rather  from  an  economics  perspective.  Thus,  a  question  of  potential  great 
relevance  is  to  study  in  a  similar  manner  (research  direction)  the  geometric  properties  of  solutions 
to  the  multi-transport  problem  min^  fJ^i)  for  a  given  set  of  distributions  /ij,  and  develop 

computational  tools  for  this  problem. 


2.1.1  Optimal  mass  transport  as  an  optimal  control  problem 

The  Monge-Kantorovich  problem  for  p  =  2  may  be  formulated  as  follows  [4].  Consider 


inf 


/i(t,  x)\\'V g{t,  x)\\‘^  dt  dx 


over  all  time  varying  densities  /i  and  functions  g  satisfying 

(9/i 


dt 


+  div(/rVp)  =  0, 

/i(0,  •)  =  /io,  /i(l,  •)  =  AH- 


(2) 


(3) 


The  integrand  in  (2)  may  be  thought  to  represent  kinetic  energy  with  u  =  Wg  representing  velocity. 
Thus,  (2)  is  an  “action”  integral  in  the  way  understood  in  the  physics  literature.  One  may  then  show 
that  the  infimum  is  attained  for  some  Umin  and  gmin,  accordingly  we  set  Umin  =  '^gmin-  Further, 
define  the  flow 

X{x,t)  =  X  +  t{Ujnin{x)  -  x) . 

Note  that  when  f  =  0,  X  is  the  identity  map  and  when  t  =  1,  it  is  the  solution  Umin  to  the  Monge- 
Kantorovich  problem.  This  analysis  provides  appropriate  justification  for  using  (2.1.1)  to  define 
a  continuous  (nonlinear)  warping  map  X  between  the  densities  po  and  pi.  Besides  the  relevance 
of  such  a  warping  for  applications  such  as  image  registration  [28],  voice  morphing  [33],  etc.,  the 
analysis  above  provides  a  Riemannian  structure  on  the  space  of  distributions  that  we  take  up  next. 


2.1.2  Riemannian  structure  of  density  functions 

Define  the  space  of  density  functions  as 

C  -.=  {n  >  0  J  jj,  =  1}. 

The  tangent  space  at  a  given  “point”  p  may  be  identified  with 

T^C  =  (n  ;  J  v  =  0}. 
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Thus,  inspired  by  the  Benamou-Brenier  framework  [4],  given  two  “points”  /io,  ^  C,  the  geodesic 

distance  is: 


inf{ /"  [  fi(t,x)\\'Vg(t,x)\\'^  dtdx 
J  Jo 

Qn 

subject  to  —  +  div(/iV5')  =  0, 

/i(0,  ■)  = /To,  /i(l,  •)  = (4) 

In  other  words,  we  look  at  all  curves  in  C  connecting  /xq  and  /xi,  and  take  the  shortest  one  with 
respect  to  the  Wasserstein  metric.  This  leads  us  to  give  C  a  Riemannian  structure,  which  will 
induce  the  Wasserstein  distance.  This  idea  is  in  fact  due  to  Jordan  et  al.  [34].  Namely,  under 
suitable  assumptions  on  differentiability  for  /r  G  C,  and  v  G  T^C,  one  solves  the  Poisson  equation 

V  =  — div(/iV(y').  (5) 

This  allows  us  to  identify  the  tangent  space  with  functions  g  up  to  additive  constant.  Thus,  for  any 
given  V  we  denote  the  solution  of  (5)  by  g^.  Then  given,  vi,V2  G  T^C,  we  can  define  the  inner 
product 

{vi,V2)t,  :=  j  (6) 

An  integration  by  parts  argument,  shows  that  this  inner  product  will  exactly  induce  the  Wasserstein 
distance  given  by  Equation  (4).  It  is  very  suggestive  to  also  note  that 


J  /iV5f„  ■  Vgv 


J  gydw{iJ,Vgv)  (integration  by  parts) 


/ 


vgv 


(7) 


Several  interesting  questions  were  explored  explored  in  detail.  For  instance,  typically,  action 
integrals  f  (T  —  V)  in  physics  have  both  a  term  corresponding  to  the  kinetic  energy  T  and  one 
corresponding  to  a  potential  V.  In  our  AFOSR  work,  in  several  publications,  we  have  studied  the 
implication  of  a  potential  term  to  the  action  integral  (2)  and  how  this  affected  the  induced  geometry. 


2.2  Riemannian  metrics  &  gradient  flows 

The  availability  of  a  natural  metric  structure  on  the  space  C  of  distributions  suggested  a  range 
of  interesting  theoretical  questions  that  were  investigated  in  our  program.  This  line  of  research 
has  led  to  the  the  rather  deep  and  surprising  fact  that  gradient  flows  of  the  Boltzmann  entropy  in 
the  geometry  of  Wasserstein-Kantorovich  metric  give  rise  to  the  heat  equation  [34] .  A  similar 
approach  gives  rise  to  the  affine-invariant  nonlinear  heat  equation  (co-discovered  by  one  of  the 
Pi’s  (AT))  which  has  been  of  great  significance  and  popularity  in  image  processing  [61,  62],  and 
connections  are  drawn  with  our  recent  work  on  a  differential-geometric  structure  based  on  optimal 
prediction  theory  for  spectral  density  functions  [21].  Thus,  the  geometry  of  OMT  links  together  the 
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Boltzmann  entropy,  the  heat  equation  and,  as  we  will  see  in  the  next  section,  the  Fisher  information 
metric.  In  a  similar  way,  the  use  of  an  information  potential  other  than  the  Boltzmann  entropy  can 
give  rise  to  alternative  gradient  flows  and  metrics  that  relate  to  affine-invariance  [61],  prediction 
theory  [21],  flow  in  porous  media  and  many  other  important  paradigms  in  physics  [68].  In  our 
AFOSR  research,  we  have  elucidated  the  relationship  between  mass  transport,  conservation  laws, 
entropy  functionals,  on  one  hand  and  probability  and  power  distributions  and  related  metrics  on  the 
other.  This  was  a  significant  undertaking  that  was  a  focus  of  the  project  and  has  been  accomplished 
and  reported  in  several  of  the  publications  that  resulted  under  the  present  grant  (see  e.g.,  [63,  46, 
8]). 


2.2.1  Boltzmann  Entropy  and  the  heat  equation 

We  consider  the  Boltzmann  entropy 


^:  = 


j  ft  log /i 


as  an  “information”  potential  and  evaluate  S  along  a  1-parameter  family  in  C, 
reserving  Ht  for  its  derivative.  Integrals  are  with  respect  to  the  spatial  variable 
S  with  respect  to  t  is 


dS 

dt 


j  log/i-f /Xt) 


j  (/^i  log/i), 


fi(t,  x)  or  simply  /i, 
X.  The  derivative  of 

(8) 


since  /  /r  =  1.  In  view  of  the  characterization  of  the  Wasserstein  norm  from  Equation  (7),  we  see 
that  the  steepest  gradient  direction  {with  respect  to  the  Wasserstein  metric)  is  given  by 


Pt  =  div(/iV  log  p)  =  A/r, 


which  is  precisely  the  linear  heat  equation.  Finally,  if  we  substitute  pt  =  A/r  into  Equation  (8), 
and  integrate  by  parts,  we  get 

Hence  the  rate  of  entropy  increase  is  given  by  the  Fisher  information  metric! 


2.2.2  Information  metrics 

In  an  analogous  manner  we  have  explored  the  implications  of  the  OMT  geometry  on  the  space  of 
power  spectra  of  stochastic  processes.  Eor  simplicity  we  assume  herein  that  power  spectra  have 
the  same  total  energy  and  thence  are  normalized  to  have  integral  1.  The  principal  reason  why  the 
OMT  geometry  is  natural  in  the  spectral  domain  is  that  the  Wasserstein  metric  is  weak* -continuous 
(generating  the  natural  topology,  like  the  Eevy-Prohorov  metric^)  and  computationally  tractable 
[23]. 

'Note  that  in  probability  “weak”  often  refers  to  “weak*”.  The  latter  terminology  is  the  correct  one  from  a  functional 
analysis  viewpoint. 
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We  now  let  /i  represent  the  power  spectral  density  of  a  discrete-time  random  process,  though 
extension  to  being  a  non-absolutely-continuous  spectral  measure  is  also  possible.  Thus,  /r  is  taken 
to  be  non-negative  on  the  unit  circle  -  herein  identified  with  [0, 1),  with  /i(0)  =  /r(l).  The  integral 
f  11  which  represents  the  variance  of  the  underlying  random  process  is  normalized  to  1.  It  is  natural 
to  take  as  “information”  potential  the  differential  entropy 


Sd-.= 


I 


logp 


as  this  relates  to  the  variance  of  the  optimal  one-step-ahead  prediction  error  which  is  simply 
We  evaluate  Sd  along  a  1 -parameter  family  and  take  the  derivative 


dSd  _  f  Pt 

dt  J  d 

The  steepest  gradient  direction  with  respect  to  the  Wasserstein  metric  is  now  given  by 


(10) 


/it  = -div(/iV-)  =  div(  — )  = - — .  (11) 

d  d  d  d 

This  is  yet  another  nonlinear  heat  equation.  We  specialize  to  one  (spatial)  dimension  (V/i  =:  dx 
is  now  simply  the  partial  derivative  with  respect  to  x)  and  we  write  Equation  (11)  explicitly  using 
partials  with  respect  to  this  one  dimension 


dt 


Upon  substitution  into  (10),  and  integration  by  parts  we  obtain  that 

f  idx? 

dt  J  d^ 


(12) 


2.2.3  Affine-invariant  heat  equation 

It  is  natural  to  consider  the  general  class  of  potentials 

=  - 1  /W. 

where  /  is  a  suitable  differentiable  increasing  function.  Then  the  exact  computation  given  earlier 
shows  that  the  corresponding  gradient  flow  with  respect  to  the  Wasserstein  metric  is 

dt  =  dividend))-  (13) 

Moreover, 
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As  an  application,  if  we  take  f{x)  =  n  >  0,  then  (13)  becomes  =  A/i”,  which  in 

one  spatial  variable  reduces  to 

^xx- 

Interestingly,  this  can  now  be  turned  into  an  equation  of  the  form 

ut  =  {u^xT  (14) 

as  follows.  Simply  apply  — (i.e.,  the  negative  inverse  of  the  Laplacian)  to  both  sides,  and  set 
u  =  —  A/i.  For  n  =  1/3,  we  get  the  1-dimensional  affine  invariant  heat  equation  [61,  3]  of  great 
popularity  in  image  processing.  While  this  equation  is  known  not  to  be  derivable  via  any  L^-based 
gradient  flow  [50,  51],  we  have  just  shown  that  it  can  also  be  derived  as  such  via  the  Wasserstein 
geometric  structure.  The  Euclidean  invariant  geometric  heat  equation  may  be  derived  via  an 
gradient  descent  flow  [26]. 

Our  main  interest  however  is  in  natural  metrics  between  distributions.  To  this  end,  we  have 
explored  several  new  metrics  that  may  be  derived  using  the  framework  introduced  here.  Indeed, 
following  [4]  again,  we  can  devise  geodesic  distances  analogous  to  action  integrals  (for  “pressure¬ 
less  fluid  flow”).  Such  integrals  may  be  taken  in  the  form: 

inf  j  J  fi(t,x)h{v(t,x))  dtdx  (15) 

over  all  time-varying  densities  fx  and  vector  fields  v  satisfying 

^  +  div(/in)  =  0, 

/i(0,  •)  fXQ,  /i(l,  •)  /il- 

Here  his  a  strictly  convex  even  function.  (In  [4],  h{v)  =  ||w|p/2.)  One  can  show  that  this  leads  to 
the  flow 

/rt  =  div(/rV/i*(V/'(/r))),  (16) 

where  h*  is  the  Legendre  transform  of  h.  We  have  considered  and  detailed  this  framework  and  its 
connections  with  fundamental  concepts  of  information  and  prediction  theory  in  [63]. 

2.2.4  Unbalanced  densities 

General  distributions  (histograms,  power  spectra,  spatio-temporal  energy  densities,  images)  may 
not  necessarily  be  normalized  to  have  the  same  integral.  Thus,  it  is  very  important  to  devise 
appropriate  metrics  and  theory.  Our  aim  is  to  provide  constructions  for  “interpolating”  data  in  the 
form  of  distributions.  Thus,  we  seek  to  view  such  in  a  natural  metric  space.  The  first  candidate 
is  the  space  of  L^-integrable  functions.  However,  as  we  will  note  next,  geodesics  are  simply 
linear  intervals  and  fail  to  have  a  number  of  desirable  properties  [23,  33].  In  particular,  the  “linear 
average”  of  two  unimodal  distributions  is  typically  bimodal.  Thus,  important  features  are  typically 
“written  over”.  It  is  instructive  for  us  to  consider  this  first. 
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One  can  show  that 


c?L2(/^o,/Wi)^  =  inf 


IJs,V 


\dtfj,(t,  x)\^  dt  dx 


over  all  time  varying  densities  jj,  and  vector  fields  v  satisfying 

^  +  div(/in)  =  0, 

/i(0,-)  =  /io,  /x(l,  •)  = /ii- 


(17) 


(18) 


The  optimality  condition  for  the  path  is  then  given  by 

dtt^{t,x)  =  0, 

which  gives  as  optimal  path  the  “interval”  (t  G  [0, 1]) 

fi(t,  x)  =  [fj.i{x)  -  fio{x)]t  +  fio{x). 


Our  claim  about  bi-modality  of  a  mix  of  two  unimodal  distributions  is  evident.  On  the  other  hand, 
OMT  geodesics  represent  nonlinear  mixing  and  have  a  considerably  different  character  [33]. 

Yet,  the  problem  may  be  used  in  conjunction  with  OMT  in  case  of  unbalanced  mass  distri¬ 
butions.  Indeed,  given  the  two  unbalanced  densities  /io  and  /ii  it  is  natural  to  seek  a  distribution  jli 
the  closest  density  to  fxi  in  the  sense,  which  minimizes  the  Wasserstein  distance  (iwass(Ato,  /li)^- 
The  perturbation  can  be  interpreted  as  “noise.”  One  can  then  show  that  this  problem  amounts 
to  minimizing 


inf 


fi(t,x)\\v\\‘^  dt  dx  +  a/2  /  \fii{x)  —  fii{x)\‘^  dx 


over  all  time  varying  densities  /i  and  vector  fields  v  satisfying 

^  +  div(/in)  =  0, 

/i(0,  •)  /iO;  /^(l)  ■)  ■ 


(19) 


(20) 


This  idea  has  been  taken  further  in  [23,  33]  where  the  two  end  points  fio,  are  allowed  to  be 
perturbed  slightly  into  flo,  jli  while  the  perturbation  equalizes  their  integrals  and  is  accounted  for 
in  the  metric.  This  leads  to  a  modified  Monge-Kantorovich  problem  which  is  best  expressed  in 
terms  of  its  dual.  Recall  that  the  original  OMT-problem  of  transferring  (balanced)  /xq  into  /ii  has 
the  following  dual  (see  e.g.,  [68]) 


max 

4>{x)+i^{y)<p{x,y)  , 


(j){x)fj,oix)dx  +  J  ^/J{y)f^l{y)dy, 

which  for  the  case  p{x,  y)  =  \x  —  y  \  can  be  shown  to  be 


max 

ll</>llLip<l 


j  0(/^O  “  /^l) 


(21) 
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with  II 011  Lip  =  sup  Lipschitz  norm  (see  [68]).  Replacing  hq,  pi  by  po,  jli  in  the  OMT 

problem  while  penalizing  the  magnitude  of  the  errors  H/Xj  —  /ij||  leads  to  the  following  metric  [23]: 

2 

inf  di{djlo,  dJjLi)  +  ny  dTvid^i,  djli).  (22) 

fio{n)=fii{n) 

2  =  1 

This  metric  in  fact  “interpolates”  the  total  variation  (which  is  not  weak*  continuous)  and  the 
Wasserstein  distance.  The  above  expression  has  an  interesting  physical  interpretation,  where  /i/s 
are  noisy  version  of  the  /ij’s  and  considerable  practical  significance,  as  it  allows  comparing  distri¬ 
butions  of  unequal  mass  in  a  natural  way.  Further,  this  is  weak*  continuous.  The  dual  formulation 
is  particularly  simple: 

*^unbalanced(/^0)  A^l)  •  maX  /  0(p,g  P-i).  (23) 

||0||Lip  <  1 

II0IIOO  <  c 

The  constant  c  depends  on  how  much  penalty  k  is  placed  on  ||/ii  —  fli\\ 2-  Geodesics  of  this  metric 
retain  several  of  the  features  at  the  end  points  [23,  33]  but,  it  is  also  substantially  more  difficult  to 
compute  (distances,  geodesics,  etc.).  Parallel  work  on  the  geometry  for  unbalanced  mass  distri¬ 
butions  using  substantially  different  tools  and  direction  has  been  pursued  in  [5,  14].  Besides  our 
interest  in  developing  natural  metrics  for  unbalanced  mass  distributions,  our  work  exemplifies  the 
focus  in  this  respect:  develop  computational  tools  for  (weak*)  metrics  between  unbalanced  mass 
distributions  and,  in  particular,  for  (23).  See  also  [47]  for  our  recent  development  of  a  general 
framework  and  specific  results  under  the  grant  for  transport  of  matrix- valued  distributions. 

2.2.5  Tracking 

One  of  the  benefits  of  a  geometric  framework  is  the  availability  of  geodesics  and  geodesic  paths  for 
linking  together  time- varying  spectra.  This  is  in  complete  analogy  with  the  usage  of  approxima¬ 
tion  techniques  in  tracking  dynamic  changes  in  traditional  system  identification.  Herein,  spectral 
geodesics  represent  a  tool  for  tracking  changes  in  the  spectral  domain.  There  are  several  possible 
natural  geometries.  Here  we  continue  on  the  one  which  we  discussed  in  the  previous  section. 

It  can  be  shown  (see  [31])  that  the  geodesic  distance  between  two  matrix- valued  power  spectral 
density  functions  po,  jUi  is 

(24) 

with  II  ■  llpr  denoting  the  Frobenius  norm.  Thus,  (24)  is  a  matricial  generalization  of  the  commonly 
used  logarithmic  deviation,  and  shows  that  the  latter  is  in  fact  a  meaningful  geometric  quantity. 
Further,  the  geodesic  path  between  the  two  is 

1/2/  -1/2  -l/2u  1/2 

=  /io  (/^o  /^i/^o  )  >  (25) 

where  t  G  [0, 1].  Likewise,  using  (25)  we  can  construct  geodesics  between  matrix- valued  power 
spectra.  Spectral  geodesics  represent  an  effective  non-parametric  model  for  nonstationarity  [33]. 
Recent  results  along  this  line  for  matrix- valued  geometric  transport  are  reported  in  [46,  15]. 
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2.2.6  Matricial  Fisher  information 


Metrics  between  matrix-valued  distributions  which  generalize  the  Fisher  information  have  also 
been  developed  for  the  purposes  of  assessing  information  in  the  setting  of  quantum  mechanics. 
Briefly,  the  scalar  Fisher  information  metric  is  of  the  form  f  The  “quantum”  analog,  where  the 
perturbation  A  is  a  matrix  as  is  the  distribution  M,  the  Fisher  information  metric  takes  the  form 

trace  (A  ■  Dm(A)) 

where  Dm(A)  is  a  “super-operator”  representing  division  of  A  by  M.  Several  options  for  this 
“non-commutative”  division  exist.  For  instance,  Y  =  Dm  (A)  can  be  selected  to  satisfy  ^{YM  + 
MY)  =  A,  but  also  Y  =  M-^AAfi,  or  /“(/  +  aM)-^A{I  +  aM)-^da  (see  e.g.,  [56,  38], 
and  [16,  Appendix]  and  the  references  therein).  There  are  strong  connections  between  these  met¬ 
rics,  which  constitute  a  family  “non-commutative”  Fisher  infromation  metrics,  and  the  “predictive 
geometry”  we  discuss  in  the  previous  section.  Our  relevant  work  that  focuses  on  multivariable 
covariance  statistics,  under  AFOSR  support,  is  [17,  18,  19,  20]  and  our  more  recent  development 
is  detailed  in  [31,  48,  32,  49,  15]. 

2.3  Unbalanced  OMT  optical  flow  for  dynamic  textures 

Optical  flow  is  a  computational  procedure  to  compute  the  motion  between  a  set  of  images,  taken 
within  a  short  time  difference.  The  main  idea  is  that  the  gray  values  of  each  image  do  not  change 
between  two  images.  This  leads  to  the  optical  flow  constraint 

It  +  u  ■  V/  =  0.  (26) 

where  /  is  the  image  and  u  =  [u,  v]  is  the  flow  field.  Given  two  images  taken  in  a  short  time 
interval,  it  is  possible  to  solve  for  the  optical  flow  field  u  by  solving  the  following  optimization 
problem 


min  \\It  +  u  ■  VI\\‘^  +  aR{u)  (27) 

U 

where  R{u)  is  a  regularization  operator,  typically  chosen  to  be  the  gradient  of  u  and  a  is  a  regular¬ 
ization  parameter. 

The  underlying  assumption  this  model  is  one  of  intensity  constancy.  Under  this  assumption 
an  objects  brightness  is  constant  from  frame  to  frame.  This  assumption  holds  for  rigid  objects 
with  a  Lambertian  surface,  but  fails  for  fluid  and  gaseous  materials.  In  computer  vision,  these  are 
modeled  by  so-called  dynamic  textures  (see  [11]).  The  dynamic  textures  typical  of  smoke  and  fire 
possess  intrinsic  dynamics  and  so  cannot  be  reliably  captured  by  the  standard  optical  flow  method. 
Also,  the  fire/smoke  region  tends  to  flow  much  faster  than  the  area  around  it  which  again  may 
cause  the  model  to  produce  erroneous  results. 

Our  goal  in  this  research  program  has  been  to  obtain  better  optical  flow  field  models  for  fire  and 
smoke  modeled  as  dynamic  textures.  One  way  to  do  so  is  to  base  the  optical  flow  on  the  physical 
attributes  of  these  processes.  One  simple  attribute  is  that  fire  and  smoke  tends  to  approximately 
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conserve  intensity  taken  as  a  generalized  mass  and  move  the  mass  in  an  optimal  way.  Thus,  an 
appropriate  mathematical  optical  constraint  is  not  intensity  preserving  but  rather  mass  preserving. 
This  model  can  be  written  as 


It  +  div  ul  =  0.  (28) 

Our  model  for  optimal  flow  is  the  one  that  minimizes  the  total  energy  defined  as  follows: 


1 


inf 

Jo 

subject  to 


\u\^  dx  dt+a/2  /  {jli{x)—^i{x)Ydx-\-II/2 


(/Tt+div  dx  dt.  (29) 


n  Jo 


fi{x,0)  =  fj,o{x),  ^{x,l)  =  fii{x). 


Our  results  have  been  presented  in  several  publications,  see  e.g.,  [43]. 


2.4  Mass  Preserving  Maps  of  Minimal  Distortion 

Both  conformal  (angle  preserving)  and  mass  preserving  diffeomorphisms  are  of  great  interest  in 
surface  deformations,  and  therefore  in  image  registration  and  template-based  tracking  algorithms; 
see  [1,  30].  It  is  obviously  very  important  to  preserve  as  much  geometric  structure  (both  local  and 
global)  in  the  deformation  map.  We  have  considered  ways  of  finding  area  preserving  maps  which 
distort  shape  minimally  in  the  following  sense. 

Let  A4  and  J\f  be  two  compact  surfaces  of  genus  0  equipped  with  Riemannian  metrics  h  and  g, 
respectively,  and  let  0  :  — )■  A/”  be  an  area  preserving  map  (i.e.,  if  Hg  and  Hh  are  the  area  forms, 

then  (j)*  (Hg)  =  flft).  Once  you  have  0  there  are  many  other  area  preserving  maps  from  M  to  M 
(just  compose  0  with  any  other  area  preserving  ^  :  A/”  — )■  A/” ).  We  are  interested  in  finding  the  one 
which  has  minimal  distortion. 

One  possible  answer  is  to  try  to  find  a  map  ip  o  cf)  which  minimizes  the  Dirichlet  integral 

v[4>\  =  \l  (30) 

^  Jm 

We  have  already  written  down  possible  steepest  descent  flows  which  would  deform  an  area  pre¬ 
serving  map  (p  :  M  ^  Af  within  the  class  of  area  preserving  maps  to  a  critical  point  (most  likely 
a  local  minimum)  of  V.  In  the  classical  Dirichlet  problem,  one  derives  a  conformal  map  as  the 
minimizer  of  such  an  integral.  In  the  present  case,  we  are  minimizing  the  Dirichlet  integral  over  a 
more  restrictive  class  of  maps  to  get  area  preservation  with  the  least  angular  distortion. 

We  have  considered  gradient  descent  flows  for  minimizing  (30)  including  existence  and  unique¬ 
ness.  Indeed,  even  though  short  time  existence  does  not  follow  from  the  standard  theory  on 
parabolic  systems,  Hamilton’s  approach  using  the  Nash-Moser  implicit  function  theorem  is  ap¬ 
plicable,  and  leads  to  the  soughtafter  result. 
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